Ridge Regression Example

This is a simple example of Ridge Regression using Python and the scikit-learn library.

Ridge Regression Overview

Ridge Regression is a linear regression technique that includes an additional regularization term to prevent overfitting. It is particularly useful when the features in the dataset are highly correlated or when the number of features is close to or exceeds the number of observations. The regularization term, controlled by the hyperparameter alpha, helps to shrink the coefficients and prevent them from becoming too large.

Key concepts of Ridge Regression:

Linear Regression: A linear model that predicts the target variable as a linear combination of input features.
L2 Regularization (Ridge Penalty): An additional term is added to the linear regression loss function to penalize large coefficients.
Regularization Parameter (alpha): Controls the strength of the regularization. Higher alpha values result in stronger regularization.

Ridge Regression is commonly used in situations where multicollinearity among features is present.

Python Source Code:

# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Generate synthetic data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Ridge Regression model with different alpha values
alpha_values = [0, 1, 10, 100]
plt.figure(figsize=(10, 6))

for alpha in alpha_values:
    ridge_model = Ridge(alpha=alpha, random_state=42)
    ridge_model.fit(X_train, y_train)
    
    # Predict on the test set
    y_pred = ridge_model.predict(X_test)
    
    # Plot the model's predictions
    x_range = np.linspace(0, 2, 100).reshape(-1, 1)
    y_range = ridge_model.predict(x_range)
    plt.plot(x_range, y_range, label=f'Ridge Regression (alpha={alpha})')

# Plot the true data points
plt.scatter(X_test, y_test, color='black', label='True Data Points')
plt.title('Ridge Regression with Different Alpha Values')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

Explanation:

Import Libraries: Import necessary Python libraries, including NumPy for numerical operations, Matplotlib for plotting, and scikit-learn for Ridge Regression.
Generate Synthetic Data: Generate synthetic data for demonstration purposes.
Split Data: Split the data into training and testing sets.
Train Ridge Regression Model: Train Ridge Regression models with different alpha values.
Predict and Plot: Predict on the test set and plot the model's predictions for different alpha values.